SPECIFICATION 

IBM Docket No. SVL9-2002-0058-US1 
TO ALL WHOM IT MAY CONCERN: 

BE IT KNOWN that we, Teodoro Cipresso, of San Jose, California and citizen of the United States, 
Andrzej McColl Krasun, of Broughton, United Kingdom and citizen of the United Kingdom, Gary 
Isaak Mazo, of San Jose, California and citizen of the United States, and William Nicholas John 
Tindall, of San Martin, California and citizen of the United Kingdom, have invented new and useful 
improvements in 



XML TO NUMERIC CONVERSION METHOD, SYSTEM, ARTICLE OF 
MANUFACTURE, AND COMPUTER PROGRAM PRODUCT 



of which the following is a specification: 
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XML TO NUMERIC CONVERSION METHOD, SYSTEM, ARTICLE OF 
MANUFACTURE, AND COMPUTER PROGRAM PRODUCT 



A portion of the Disclosure of this patent document contains material which is subject to 
copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone 
of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office 
patent file or records, but otherwise reserves all copyright rights whatsoever. 
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1 

2 BACKGROUND OF THE INVENTION 

3 
4 

5 1. Field of the Invention 

6 

7 [0001] The present invention relates in general to computer programs, and more particularly to 

8 converting a textual representation of a number into a numeric representation of the number. 
9 

10 

11 2. Description of the Related Art 

12 

13 [0002] Tagged message formats such as Extended Markup Language (XML) are rapidly replacing 

14 fixed-format message formats that typify Electronic Data Interchange (EDI) because the tagged 

15 message formats are so much more flexible. However, the fixed-format messages are much more 

16 efficient from storage and processing perspectives as they may be typically be processed hundreds, or 

17 even thousands of times, more quickly than corresponding tagged format messages. For example, 

18 information describing a customer account may be represented by the following EDI fixed-format 

19 message (where the "b" characters represent spaces): 

20 03 17789362JohnbbbbbbbbbbQCitizenbbbbbbbb0003475.990000069.23 

21 The same information may be represented by the following XML tagged format message: 

22 <?xml version="1.0"?> 

23 <Account> 

24 <Number>031-778936-2</Number> 

25 <Name> 

26 <First>John</First> 

27 <MI>Q</MI> 

28 <Last>Citizeno 

29 </Name> 

30 <01dBalance>3475.75</01dBalance> 

31 <NewBalance>69.23</NewBalance> 
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i </Account> 
2 

3 [0003] Although the XML tagged format is self-evidently clearer and more flexible, processing it 

4 is far less efficient than processing fixed-format data. There are various reasons why processing 

5 tagged messages is slower. Two of the main reasons are: 

6 1. The receiving program has to analyze the message, character by character, to distinguish 

7 markup (the tags) from message content. This process is called "parsing," and is 

8 computationally very expensive. 

9 2. The message content itself (between the markup tags) typically does not have a 

10 fixed-format, and this format must be determined before the content can be processed. 

1 1 Discovering the format by using conventional techniques is also computationally expensive. 
12 

13 [0004] In a fixed-format message, a numeric quantity is always the same size, and has the same 

14 number of integer and decimal places and so on. In a tagged message format, the representation of 

15 the same numeric quantity can vary very widely. For example, with a fixed format of six integer 

16 places, a decimal point and two decimal places, the number 20 would always be represented exactly 

17 as: 000020.00. In the unconstrained formats that are typical for tagged format messages, the number 

18 20 may be represented in a variety of ways: 20, 20., 20.00, 000020.00, et cetera. Such variability is 

19 one of the benefits of using tagged format messages as the sender and receiver of a message do not 

20 need to have identical definitions, and can evolve separately. However, this flexibility may come at 

21 a penalty in performance if the receiver of the message uses a conventional method of acquiring the 

22 incoming data values. For instance, a generalized numeric conversion function may accept any of 

23 the illustrated formats, converting them to a standardized representation that could then be assigned 

24 to the appropriate program variable. Unfortunately, this may require over a thousand machine 

25 instructions. 
26 

27 [0005] Thus, there is a clearly felt need for an improved conversion of a textual representation of a 

28 number into a numeric representation of the number. 
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1 SUMMARY OF THE INVENTION 

2 

3 [0006] Preferred embodiments of the present invention comprise a method, system, article of 

4 manufacture, and computer program product for converting a textual representation of a number into 

5 a numeric representation of the number. 
6 

7 [0007] In accordance with a preferred embodiment of the present invention, a text representation of 

8 a number is converted into a numeric representation of the number by converting the text 

9 representation of the number into a description of the number's format; mapping the description of 

10 the number's format to a sequence of conversion code; and converting the text representation of the 

1 1 number into the numeric representation of the number by use of the sequence of conversion code. 
12 

13 [0008] In accordance with an aspect of a preferred embodiment of the present invention, the 

14 description of the number's format is a picture string. 
15 

16 [0009] In accordance with another aspect of a preferred embodiment of the present invention, the 

17 text representation of the number is converted into a description of the number's format by a 

18 translation instruction using a translate table. 
19 

20 [0010] In accordance with another aspect of a preferred embodiment of the present invention, the 

21 sequence of conversion code for converting the text representation of the number into the numeric 

22 representation of the number comprises an assignment statement. 
23 

24 [0011] In accordance with another aspect of a preferred embodiment of the present invention, the 

25 mapping of the description of the number's format to a sequence of conversion code comprises 

26 mapping the description of the number's format to an index which is used to transfer control to the 

27 sequence of conversion code corresponding to the description of the number's format. 
28 

29 [0012] In accordance with another aspect of a preferred embodiment of the present invention, if the 

30 text representation of the number does not convert into the description of the number's format, then 

31 the subsequent mapping and converting steps are not executed. 
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32 

33 [0013] A preferred embodiment of the present invention has the advantage of providing improved 

34 conversion of a textual representation of a number into a numeric representation of the number. 
35 

36 [0014] A preferred embodiment of the present invention has the advantage of reducing execution 

37 time for conversion of a textual representation of a number into a numeric representation of the 

38 number 
39 

40 [0015] A preferred embodiment of the present invention has the advantage of reducing memory for 

41 conversion of a textual representation of a number into a numeric representation of the number. 
42 

43 [0016] A preferred embodiment of the present invention has the advantage of reducing an amount 

44 of program code for conversion of a textual representation of a number into a numeric representation 

45 of the number. 
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1 BRIEF DESCRIPTION OF THE DRAWINGS 

2 

3 [0017] For a more complete understanding of the present invention and the advantages thereof, 

4 reference is now made to the Description of the Preferred Embodiment in conjunction with the 

5 attached Drawings, in which: 
6 

7 Figure 1 is a block diagram of a preferred embodiment of the present invention; 

8 

9 Figure 2 illustrates a sample use of preferred embodiment of the present invention; 

10 

1 1 Figure 3 is a flowchart of method steps preferred in carrying out a preferred embodiment of 

12 the present invention; and 
13 

14 Figure 4 is a block diagram of a computer system used in performing a method of a 

15 preferred embodiment of the present invention, forming part of an apparatus of a preferred 

16 embodiment of the present invention, storing a data structure of a preferred embodiment of the 

17 present invention, and which may use an article of manufacture comprising a computer-readable 

18 storage medium having a computer program embodied in said medium which may cause the 

19 computer system to practice a preferred embodiment of the present invention. 
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1 DESCRIPTION OF THE PREFERRED EMBODIMENT 

2 

3 [0018] An embodiment of the invention is now described with reference to the figures where like 

4 reference numbers indicate identical or functionally similar elements. Also in the figures, the left 

5 most digit of each reference number corresponds to the figure in which the reference number is first 

6 used. While specific configurations and arrangements are discussed, it should be understood that 

7 this is done for illustrative purposes only. A person skilled in the relevant art will recognize that 

8 other configurations and arrangements can be used without departing from the spirit and scope of the 

9 invention. It will be apparent to a person skilled in the relevant art that this invention can also be 
10 employed in a variety of other devices and applications. 

11 

12 [0019] The preferred embodiment of the present invention directs control for a given input value to 

13 a set of previously compiled program statements that process the exact format of each particular 

14 instance of the input value. This is accomplished in three major steps. First, each digit in the textual 

15 representation input value is converted to the digit 9, which can be accomplished with a single 

16 machine instruction on many models of IBM mainframe computers. This converted value is a valid 

17 descriptor, the picture string. For example, if the textual representation input value is 0000020.00, 

18 then its descriptor or picture string is 9999999.99 and its length is 10. 
19 

20 [0020] An efficient way of acquiring a numeric value from a message and assigning it to a program 

21 variable is to use a compiled language assignment statement, such as a COBOL MOVE statement. 

22 However, an individual MOVE statement is specific to and only works correctly for a particular 

23 number format. Thus, if a set of suitable MOVE statements is provided, each one for a particular 

24 format, and if the program can determine the particular format, then the program can transfer control 

25 to the appropriate MOVE statement. The problem is discovering the format of the number and 

26 transferring control in a manner that the advantages of the high performance compiled MOVE 

27 statement are not lost due to the cost of determining the format of the number. Ideally, the format of 

28 the number would be instantaneously determined at no cost to provide a description of the format of 

29 the number in a form that could be used to direct control to the appropriate compiled MOVE 

30 statement. The invention significantly reduces the cost associated with such a determination. 
31 
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1 [0021] In the preferred embodiment of the present invention, a COBOL MOVE statement is used to 

2 convert the textual representation of a number into the numeric representation of the number. The 

3 COBOL language can describe the format of a data item in terms of a "picture string." In the 

4 picture string for a numeric quantity, each decimal digit position is represented by the character '9', 

5 the decimal point by the character V and so on. For example, the picture string description for the 

6 number "12345.67" is "99999.99". The data descriptions and the MOVE statement that can 

7 correctly assign this number to an operational data item are as follows: 

8 01 NUMBER-IN-MESSAGE USAGE DISPLAY PICTURE 99999.99. 

9 01 OPERATIONAL-DATA-ITEM USAGE COMPUTATIONAL PICTURE 9999999V99. 
10 

1 1 MOVE NUMBER-IN-MESSAGE TO OPERATIONAL-DATA-ITEM 

12 

13 [0022] However, this MOVE statement only correctly converts the textual representation of the 

14 number into the numeric representation of the number if the textual representation of the number has 

15 exactly five integer digits, followed by one decimal point, followed by two decimal digits. The 

16 preferred embodiment of the present invention uses a novel method to transform the textual 

17 representation of the number into a description of its format in terms of its "picture" with a single 

18 machine instruction. This description is converted into a number which becomes an argument to a 

19 computed GO TO statement that passes control to a proper MOVE statement for converting the 

20 textual representation of the number into the numeric representation of the number. 
21 

22 [0023] The preferred embodiment of the present invention directs control for a given input value to 

23 a set of previously compiled program statements that process the exact format of each particular 

24 instance of the input value. This is accomplished in three major steps. First, each digit in the textual 

25 representation input value is converted to the digit 9, which can be accomplished with a single 

26 machine instruction on many models of IBM mainframe computers. This converted value is a valid 

27 descriptor, the picture string. For example, if the textual representation input value is 0000020.00, 

28 then its descriptor or picture string is 9999999.99 and its length is 10. 
29 

30 [0024] Although the COBOL language statement that the preferred embodiment uses to implement 

3 1 the conversion appears complicated: 
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"1 INSPECT PL- 1 REPLACING ALL 

2 '0' BY '9' T BY '9' '2' BY '9' '3' BY '9' '4' BY '9' 

3 '5' BY '9' '6' BY '9' 7' BY '9' '8* BY '9' ' ' BY '?' 

4 the code generated by the COBOL compiler for this statement consists of only the single machine 

5 instruction "translate", operation code "TR": 

6 TR PL- 1 , TRANSLATE-TABLE 

7 The translate table embodies the above set of character replacements wherein '0', T, '2', '3', '4', '5', '6', 

8 7', and '8' are replaced by '9', wherein a space character is replaced by '?', and wherein other 

9 characters remain unchanged. 
10 

1 1 [0025] In the second major step, the picture string produced by the first step is mapped to a number 

12 using techniques well known in the art such as a binary search or hashing. This number is then used 

13 in a computed GOTO statement to transfer control to the assignment statement that corresponds 

14 exactly with the picture string for the input value. Some examples of these assignment statements 

15 are: 
16 

17 M7-D1-W5D2D. 

18 MOVE S-Dl TO T-W5D2D 

19 GO TO CONTENT-TRANSFORMED-EXIT 
20 

21 M7-W4D2-W5D2D. 

22 MOVE S-W4D2 TO T-W5D2D 

23 GO TO CONTENT-TRANSFORMED-EXIT 
24 

25 The naming of the labels, such as "M7-W4D2-W5D2D", gives indicates the semantics of the 

26 assignments: "M7" implies 

27 that the answer will occupy 7 digit positions; "W4D2" means that the source has 4 digits before the 

28 decimal point and 2 digits after it; "W5D2D" mean that the target has 5 digits before the decimal 

29 point and 2 digits after it. 
30 

3 1 [0026] If the picture string cannot be mapped to a number, then the textual representation of the 
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1 input value does not have one of the expected formats. In this case in the third major step, control 

2 falls through to the standard library conversion routine, which can convert valid but uncommon 

3 formats. Using this standard conversion has the additional benefit that, if invalid input is received, it 

4 will not be erroneously placed automatically into the output. For example, an input value of 

5 "34AB449.12" will translate to an invalid picture string "99AB999.99" which does not match any 

6 valid picture string such as "9999999.99". 
7 

8 [0027] Referring now to Figure 1 and Figure 2, data such as an XML document 105 may contain 

9 a textual representation of a number which requires conversion into a numeric representation of the 

10 number before assignment to a program variable. The XML document 105 contains XML 

1 1 statements 205, and in particular, an XML statement 210 containing a textual representation 215 of a 

12 number which requires conversion into a numeric representation of the number before assignment to 

13 a program variable. The XML document 105 is parsed by a conventional XML parser 110 which 

14 upon parsing and identifying the textual representation 215 extracts the textual representation 220 

15 and provides it to a format generator 115. The format generator transforms the textual representation 

16 220 of the number into a description 225 of its format in terms of its "picture string". This 

17 transformation is preferably performed by a single TRANSLATE machine instruction 230 which 

18 may be produced from a compilation of a COBOL INSPECT statement 235. This description or 

19 picture string is converted into a number (120 and 240) which becomes an argument to a computed 

20 GO TO statement that passes control to a proper converter from a group of converters comprising 

21 converter 1 125, converter 2 130, converter 3 135, ... or converter N 140 containing the proper 

22 MOVE statement 245 for converting the textual representation of the number into the numeric 

23 representation (145 and 250) of the number. 
24 

25 [0028] Referring now to Figure 3, the flowchart 300 illustrates the operations preferred in carrying 

26 out the preferred embodiment of the present invention. In the flowcharts, the graphical conventions 

27 of a diamond for a test or decision and a rectangle for a process or function are used. These 

28 conventions are well understood by those skilled in the art, and the flowcharts are sufficient to enable 

29 one of ordinary skill to write code in any suitable computer programming language. 
30 

3 1 [0029] After the start 305 of the process 300, process block 310 converts the text representation of 
SVL9-2002-0058-US1 11 



1 the number into a description of the number's format, and process block 315 maps the description of 

2 the number's format to an index. Thereafter, decision block 320 determines if a valid index, hash, or 

3 search result was produced by the mapping of the description of the number's format to an index. If 

4 a valid index, hash, or search result was produced by the mapping of the description of the number's 

5 format to an index, then process block 325 determines a sequence of conversion code corresponding 

6 to the description of the number's format by use of the index. Process block 330 transfers control to 

7 the sequence of conversion code corresponding to the description of the number's format. Process 

8 block 335 converts the text representation of the number into the numeric representation of the 

9 number by use of the sequence of conversion code. Process block 340 returns the numeric 
10 representation of the number. The process ends at process block 345. 

11 

12 [0030] Returning now to decision block 320, if a valid index, hash, or search result was not 

13 produced by the mapping of the description of the number's format to an index, then control passes 

14 to process block 350 which returns an error indicating that no conversion was performed, and the 

15 process ends at process block 345. 
16 

17 [0031] With reference now to the figures, and in particular with reference to Figure 4, there is 

18 depicted a pictorial representation of a computer system 400 which may be utilized to implement a 

19 method, system, article of manufacture, data structure, and computer program product of preferred 

20 embodiments of the present invention. The block diagram of Figure 4 illustrates a computer system 

21 400 used in performing the method of the present invention, forming part of the apparatus of the 

22 present invention, and which may use the article of manufacture comprising a computer-readable 

23 storage medium having a computer program embodied in said medium which may cause the 

24 computer system to practice the present invention. The computer system 400 includes a processor 

25 402, which includes a central processing unit (CPU) 404, and a memory 406. Additional memory, in 

26 the form of a hard disk file storage 408 and a computer-readable storage device 410, is connected to 

27 the processor 402. Computer-readable storage device 410 receives a computer-readable storage 

28 medium 412 having a computer program embodied in said medium which may cause the computer 

29 system to implement the present invention in the computer system 400. The computer system 400 

30 includes user interface hardware, including a mouse 414 and a keyboard 416 for allowing user input 
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1 to the processor 402 and a display 418 for presenting visual data to the user. The computer system 

2 may also include a printer 420. 
3 

4 [0032] Using the foregoing specification, the invention may be implemented using standard 

5 programming and/or engineering techniques using computer programming software, firmware, 

6 hardware or any combination or sub-combination thereof. Any such resulting program(s), having 

7 computer readable program code means, may be embodied within one or more computer usable 

8 media such as fixed (hard) drives, disk, diskettes, optical disks, magnetic tape, semiconductor 

9 memories such as Read-Only Memory (ROM), Programmable Read-Only Memory (PROM), etc., or 

10 any memory or transmitting device, thereby making a computer program product, i.e., an article of 

1 1 manufacture, according to the invention. The article of manufacture containing the computer 

12 programming code may be made and/or used by executing the code directly or indirectly from one 

13 medium, by copying the code from one medium to another medium, or by transmitting the code over 

14 a network. An apparatus for making, using, or selling the invention may be one or more processing 

15 systems including, but not limited to, central processing unit (CPU), memory, storage devices, 

16 communication links, communication devices, servers, input/output (I/O) devices, or any sub- 

17 components or individual parts of one or more processing systems, including software, firmware, 

18 hardware or any combination or sub-combination thereof, which embody the invention as set forth in 

19 the claims. User input may be received from the keyboard, mouse, pen, voice, touch screen, or any 

20 other means by which a human can input data to a computer, including through other programs such 

21 as application programs, databases, data sets, or files. 
22 

23 [0033] One skilled in the art of computer science will easily be able to combine the software 

24 created as described with appropriate general purpose or special purpose computer hardware to 

25 create a computer system and/or computer sub-components embodying the invention and to create a 

26 computer system and/or computer sub-components for carrying out the method of the invention. 

27 Although the present invention has been particularly shown and described with reference to a 

28 preferred embodiment, it should be apparent that modifications and adaptations to that embodiment 

29 may occur to one skilled in the art without departing from the spirit or scope of the present invention 

30 as set forth in the following claims. 
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